--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-05-14 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 38% (27%-53%) 5,226 130
Albania 79% (39%-100%) 880 31
Algeria 35% (23%-50%) 6,253 522
Andorra 23% (13%-39%) 760 49
Argentina 21% (16%-28%) 6,866 329
Armenia 74% (50%-98%) 3,718 48
Australia 86% (52%-100%) 6,975 98
Austria 39% (25%-61%) 15,964 624
Azerbaijan 85% (59%-100%) 2,758 35
Bahamas 48% (11%-99%) 94 11
Bangladesh 96% (75%-100%) 17,822 269
Belarus 99% (93%-100%) 25,825 146
Belgium 12% (9.5%-15%) 53,981 8,843
Bolivia 20% (15%-28%) 3,148 142
Bosnia and Herzegovina 10% (5.9%-16%) 2,181 120
Brazil 13% (10%-15%) 188,974 13,149
Bulgaria 26% (18%-40%) 2,069 96
Burkina Faso 23% (14%-40%) 773 51
Cameroon 74% (22%-100%) 2,800 136
Canada 14% (11%-17%) 72,278 5,304
Chad 5.7% (3.4%-10%) 372 42
Chile 90% (75%-99%) 34,381 346
China 99% (100%-100%) 84,024 4,637
Colombia 26% (20%-33%) 12,930 509
Congo 63% (22%-100%) 341 11
Cote dIvoire 85% (57%-100%) 1,912 24
Croatia 20% (11%-32%) 2,213 94
Cuba 33% (21%-51%) 1,810 79
Cyprus 73% (38%-100%) 905 17
Czechia 35% (26%-46%) 8,269 290
Democratic Republic of the Congo 24% (12%-42%) 1,242 50
Denmark 34% (24%-46%) 10,667 533
Dominican Republic 51% (38%-65%) 11,196 409
Ecuador 5.9% (4.8%-7.2%) 30,486 2,334
Egypt 31% (23%-40%) 10,431 556
El Salvador 51% (26%-91%) 1,112 20
Estonia 34% (21%-52%) 1,751 61
Finland 38% (23%-58%) 6,054 284
France 9.5% (7.7%-11%) 140,734 27,074
Germany 24% (19%-29%) 172,239 7,723
Ghana 95% (81%-100%) 5,408 24
Greece 32% (20%-50%) 2,760 155
Guatemala 44% (25%-73%) 1,342 29
Guernsey 42% (13%-95%) 252 13
Honduras 20% (13%-32%) 2,255 123
Hungary 11% (8%-16%) 3,380 436
Iceland 87% (55%-100%) 1,802 10
India 35% (28%-42%) 78,003 2,549
Indonesia 26% (18%-35%) 15,438 1,028
Iran 34% (28%-41%) 112,725 6,783
Iraq 55% (27%-92%) 3,032 115
Ireland 28% (21%-37%) 23,401 1,497
Isle of Man 21% (7.9%-61%) 332 23
Israel 82% (63%-99%) 16,548 264
Italy 15% (12%-18%) 222,104 31,106
Japan 18% (13%-24%) 16,079 687
Jersey 14% (6.6%-33%) 296 27
Kazakhstan 97% (85%-100%) 5,571 32
Kenya 19% (11%-29%) 737 40
Kosovo 46% (26%-77%) 919 29
Kuwait 84% (61%-100%) 11,028 82
Kyrgyzstan 74% (36%-100%) 1,082 12
Latvia 53% (25%-95%) 951 19
Lebanon 55% (29%-96%) 878 26
Liberia 36% (8.7%-91%) 213 20
Lithuania 40% (24%-65%) 1,505 54
Luxembourg 51% (33%-70%) 3,904 103
Malaysia 95% (76%-100%) 6,779 111
Mali 24% (15%-38%) 758 44
Mauritius 60% (16%-100%) 332 10
Mexico 7.9% (6.5%-9.5%) 40,186 4,220
Moldova 29% (21%-38%) 5,406 199
Morocco 95% (79%-100%) 6,512 188
Netherlands 16% (13%-20%) 43,211 5,562
New Zealand 46% (22%-85%) 1,147 21
Niger 18% (7.9%-32%) 860 49
Nigeria 25% (18%-33%) 4,971 164
North Macedonia 26% (17%-42%) 1,694 95
Norway 38% (16%-67%) 8,158 229
Oman 95% (80%-100%) 4,019 17
Pakistan 43% (33%-54%) 35,788 770
Panama 49% (35%-65%) 8,944 256
Paraguay 84% (49%-100%) 740 11
Peru 42% (33%-51%) 76,306 2,169
Philippines 21% (16%-26%) 11,618 772
Poland 26% (20%-33%) 17,204 861
Portugal 40% (31%-50%) 28,132 1,175
Puerto Rico 33% (21%-49%) 2,329 115
Qatar 93% (68%-100%) 26,539 14
Romania 18% (14%-22%) 16,002 1,016
Russia 90% (78%-99%) 242,271 2,212
San Marino 75% (40%-100%) 643 41
Saudi Arabia 99% (93%-100%) 44,830 273
Serbia 87% (61%-100%) 10,295 222
Singapore 94% (74%-100%) 25,346 21
Sint Maarten 13% (4.2%-34%) 77 15
Slovakia 64% (37%-98%) 1,469 27
Slovenia 17% (11%-25%) 1,463 103
Somalia 26% (15%-46%) 1,219 52
South Africa 45% (34%-59%) 12,074 219
South Korea 42% (17%-79%) 10,991 260
Spain 16% (13%-19%) 228,691 27,104
Sudan 18% (11%-27%) 1,818 90
Sweden 15% (11%-18%) 27,909 3,460
Switzerland 25% (19%-32%) 30,330 1,563
Thailand 74% (50%-99%) 3,017 56
Tunisia 54% (25%-97%) 1,032 45
Turkey 66% (53%-79%) 143,114 3,952
Ukraine 36% (26%-50%) 16,425 439
United Arab Emirates 88% (67%-100%) 20,386 206
United Kingdom 16% (13%-19%) 229,705 33,186
United Republic of Tanzania 43% (22%-79%) 509 21
United States of America 33% (27%-39%) 1,390,746 84,133
Uruguay 42% (20%-83%) 719 19
Uzbekistan 91% (66%-100%) 2,612 11
Venezuela 82% (44%-100%) 440 10

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.